{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LSTM (Long Short Term Memory)\n",
    "\n",
    "There is a branch of Deep Learning that is dedicated to processing time series. These deep Nets are **Recursive Neural Nets (RNNs)**. LSTMs are one of the few types of RNNs that are available. Gated Recurent Units (GRUs) are the other type of popular RNNs.\n",
    "\n",
    "This is an illustration from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (A highly recommended read)\n",
    "\n",
    "![RNNs](../images/RNN-unrolled.png)\n",
    "\n",
    "Pros:\n",
    "- Really powerful pattern recognition system for time series\n",
    "\n",
    "Cons:\n",
    "- Cannot deal with missing time steps.\n",
    "- Time steps must be discretised and not continuous.\n",
    "\n",
    "![trump](../images/trump.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/neRzqYlFkR4?rel=0&amp;controls=0&amp;showinfo=0\" frameborder=\"0\" allowfullscreen></iframe>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from IPython.display import HTML\n",
    "\n",
    "HTML('<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/neRzqYlFkR4?rel=0&amp;controls=0&amp;showinfo=0\" frameborder=\"0\" allowfullscreen></iframe>')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import re\n",
    "\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization, LSTM, Embedding, TimeDistributed\n",
    "from keras.models import load_model, model_from_json\n",
    "\n",
    "import pickle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>text</th>\n",
       "      <th>created_at</th>\n",
       "      <th>favorite_count</th>\n",
       "      <th>is_retweet</th>\n",
       "      <th>id_str</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Twitter for iPhone</td>\n",
       "      <td>i think senator blumenthal should take a nice ...</td>\n",
       "      <td>08-07-2017 20:48:54</td>\n",
       "      <td>61446</td>\n",
       "      <td>false</td>\n",
       "      <td>8.946617e+17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Twitter for iPhone</td>\n",
       "      <td>how much longer will the failing nytimes with ...</td>\n",
       "      <td>08-07-2017 20:39:46</td>\n",
       "      <td>42235</td>\n",
       "      <td>false</td>\n",
       "      <td>8.946594e+17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Twitter for iPhone</td>\n",
       "      <td>the fake news media will not talk about the im...</td>\n",
       "      <td>08-07-2017 20:15:18</td>\n",
       "      <td>45050</td>\n",
       "      <td>false</td>\n",
       "      <td>8.946532e+17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Twitter for iPhone</td>\n",
       "      <td>on #purpleheartday i thank all the brave men a...</td>\n",
       "      <td>08-07-2017 18:03:42</td>\n",
       "      <td>48472</td>\n",
       "      <td>false</td>\n",
       "      <td>8.946201e+17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Twitter for iPhone</td>\n",
       "      <td>...conquests how brave he was and it was all a...</td>\n",
       "      <td>08-07-2017 12:01:20</td>\n",
       "      <td>59253</td>\n",
       "      <td>false</td>\n",
       "      <td>8.945289e+17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               source                                               text  \\\n",
       "0  Twitter for iPhone  i think senator blumenthal should take a nice ...   \n",
       "1  Twitter for iPhone  how much longer will the failing nytimes with ...   \n",
       "2  Twitter for iPhone  the fake news media will not talk about the im...   \n",
       "4  Twitter for iPhone  on #purpleheartday i thank all the brave men a...   \n",
       "5  Twitter for iPhone  ...conquests how brave he was and it was all a...   \n",
       "\n",
       "            created_at favorite_count is_retweet        id_str  \n",
       "0  08-07-2017 20:48:54          61446      false  8.946617e+17  \n",
       "1  08-07-2017 20:39:46          42235      false  8.946594e+17  \n",
       "2  08-07-2017 20:15:18          45050      false  8.946532e+17  \n",
       "4  08-07-2017 18:03:42          48472      false  8.946201e+17  \n",
       "5  08-07-2017 12:01:20          59253      false  8.945289e+17  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/trump.csv') # might need to change location if on Floydhub\n",
    "df = df[df.is_retweet=='false']\n",
    "df.text = df.text.str.lower()\n",
    "df.text = df.text.str.replace(r'http[\\w:/\\.]+','') # remove urls\n",
    "df.text = df.text.str.replace(r'[^!\\'\"#$%&\\()*+,-./:;<=>?@_’`{|}~\\w\\s]',' ') #remove everything but characters and punctuation\n",
    "df.text = df.text.str.replace(r'\\s\\s+',' ') #replace multple white space with a single one\n",
    "df = df[[len(t)<180 for t in df.text.values]]\n",
    "df = df[[len(t)>50 for t in df.text.values]]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(23902, 6)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['be sure to tune in and watch donald trump on late night with david letterman as he presents the top ten list tonight!',\n",
       " 'donald trump will be appearing on the view tomorrow morning to discuss celebrity apprentice and his new book think like a champion!',\n",
       " 'donald trump reads top ten financial tips on late show with david letterman: - very funny!',\n",
       " 'new blog post: celebrity apprentice finale and lessons learned along the way: ',\n",
       " 'my persona will never be that of a wallflower - i’d rather build walls than cling to them --donald j. trump']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trump_tweets = [text for text in df.text.values[::-1]]\n",
    "trump_tweets[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a dictionary to convert letters to numbers and vice versa."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "all_tweets = ''.join(trump_tweets)\n",
    "char2int = dict(zip(set(all_tweets), range(len(set(all_tweets)))))\n",
    "char2int['<END>'] = len(char2int)\n",
    "char2int['<GO>'] = len(char2int)\n",
    "char2int['<PAD>'] = len(char2int)\n",
    "int2char = dict(zip(char2int.values(), char2int.keys()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "text_num = [[char2int['<GO>']]+[char2int[c] for c in tweet]+ [char2int['<END>']] for tweet in trump_tweets]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAD8CAYAAACRkhiPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEMdJREFUeJzt3X+snuVdx/H3R9iQ/cCBdLUrje1MZwIkslErOjVMdOBm\nLPtn6aKjiwiL4LKZRYUtcfOPJmzuR0IiGBQEdEKajUmTgY6R6bJEYAdkQMsa6oDRWuiZizI1YSv7\n+sdzMR4P5/T87HnOc673K7nz3M/3/nGuK6ecz3Nd9/3cpKqQJPXpR0bdAEnS6BgCktQxQ0CSOmYI\nSFLHDAFJ6pghIEkdMwQkqWOGgCR1zBCQpI4dP+oGzObUU0+tjRs3jroZkjRW7r///m9X1ZrZ9lvx\nIbBx40YmJiZG3QxJGitJnpzLfk4HSVLHDAFJ6pghIEkdMwQkqWOGgCR1zBCQpI4ZApLUMUNAkjpm\nCEhSx1b8N4YlrRwbr/jCtPUnrnr7MrdES2XWkUCSDUm+nGRvkj1J3t/qH01yMMmDbXnb0DFXJtmf\nZF+S84fqZyd5uG27OkmOTbckSXMxl5HAEeCDVfVAklcD9ye5q237dFV9YnjnJKcD24EzgNcBX0ry\nhqp6HrgWuAS4F7gDuAC4c2m6Ikmar1lHAlV1qKoeaOvfBR4F1h/lkG3ArVX1XFU9DuwHtiZZB5xU\nVfdUVQE3AxcuugeSpAWb14XhJBuBNzL4JA/wviQPJbkhycmtth54auiwA622vq1PrU/3cy5NMpFk\nYnJycj5NlCTNw5xDIMmrgM8BH6iqZxlM7bweOAs4BHxyqRpVVddV1Zaq2rJmzayPw5YkLdCcQiDJ\nyxgEwGeq6jaAqnqmqp6vqh8AfwlsbbsfBDYMHX5aqx1s61PrkqQRmcvdQQGuBx6tqk8N1dcN7fYO\n4JG2vhvYnuSEJJuAzcB9VXUIeDbJOe2cFwG3L1E/JEkLMJe7g94MvBt4OMmDrfYh4F1JzgIKeAJ4\nL0BV7UmyC9jL4M6iy9udQQCXATcCJzK4K8g7gyRphGYNgar6KjDd/fx3HOWYncDOaeoTwJnzaaAk\n6djxsRGS1DFDQJI6ZghIUscMAUnqmCEgSR0zBCSpY4aAJHXMEJCkjhkCktQxQ0CSOmYISFLHDAFJ\n6pghIEkdMwQkqWOGgCR1zBCQpI4ZApLUMUNAkjpmCEhSxwwBSeqYISBJHTMEJKljhoAkdcwQkKSO\nGQKS1DFDQJI6ZghIUscMAUnqmCEgSR0zBCSpY4aAJHVs1hBIsiHJl5PsTbInyftb/ZQkdyV5rL2e\nPHTMlUn2J9mX5Pyh+tlJHm7brk6SY9MtSdJczGUkcAT4YFWdDpwDXJ7kdOAK4O6q2gzc3d7Ttm0H\nzgAuAK5Jclw717XAJcDmtlywhH2RJM3TrCFQVYeq6oG2/l3gUWA9sA24qe12E3BhW98G3FpVz1XV\n48B+YGuSdcBJVXVPVRVw89AxkqQRmNc1gSQbgTcC9wJrq+pQ2/Q0sLatrweeGjrsQKutb+tT65Kk\nEZlzCCR5FfA54ANV9ezwtvbJvpaqUUkuTTKRZGJycnKpTitJmmJOIZDkZQwC4DNVdVsrP9OmeGiv\nh1v9ILBh6PDTWu1gW59af4mquq6qtlTVljVr1sy1L5KkeZrL3UEBrgcerapPDW3aDexo6zuA24fq\n25OckGQTgwvA97Wpo2eTnNPOedHQMZKkETh+Dvu8GXg38HCSB1vtQ8BVwK4kFwNPAu8EqKo9SXYB\nexncWXR5VT3fjrsMuBE4EbizLZKkEZk1BKrqq8BM9/OfN8MxO4Gd09QngDPn00BJ0rHjN4YlqWOG\ngCR1zBCQpI4ZApLUMUNAkjpmCEhSxwwBSeqYISBJHTMEJKljhoAkdcwQkKSOGQKS1DFDQJI6ZghI\nUscMAUnqmCEgSR0zBCSpY4aAJHXMEJCkjhkCktQxQ0CSOmYISFLHDAFJ6pghIEkdMwQkqWOGgCR1\nzBCQpI4ZApLUMUNAkjpmCEhSxwwBSeqYISBJHZs1BJLckORwkkeGah9NcjDJg21529C2K5PsT7Iv\nyflD9bOTPNy2XZ0kS98dSdJ8zGUkcCNwwTT1T1fVWW25AyDJ6cB24Ix2zDVJjmv7XwtcAmxuy3Tn\nlCQto1lDoKq+AnxnjufbBtxaVc9V1ePAfmBrknXASVV1T1UVcDNw4UIbLUlaGou5JvC+JA+16aKT\nW2098NTQPgdabX1bn1qfVpJLk0wkmZicnFxEEyVJR7PQELgWeD1wFnAI+OSStQioquuqaktVbVmz\nZs1SnlqSNGRBIVBVz1TV81X1A+Avga1t00Fgw9Cup7XawbY+tS5JGqEFhUCb43/BO4AX7hzaDWxP\nckKSTQwuAN9XVYeAZ5Oc0+4Kugi4fRHtliQtgeNn2yHJLcC5wKlJDgAfAc5NchZQwBPAewGqak+S\nXcBe4AhweVU93051GYM7jU4E7myLJGmEZg2BqnrXNOXrj7L/TmDnNPUJ4Mx5tU6SdEz5jWFJ6pgh\nIEkdMwQkqWOGgCR1zBCQpI4ZApLUMUNAkjpmCEhSxwwBSerYrN8YlrR6bbziC9PWn7jq7cvcEo2K\nIwFJ6pghIEkdczpIWmJOsWicOBKQpI4ZApLUMUNAkjrmNQFJLzHTdQ2tPo4EJKljhoAkdcwQkKSO\nGQKS1DFDQJI6ZghIUscMAUnqmCEgSR0zBCSpY4aAJHXMx0ZIWjQfnz2+HAlIUscMAUnqmCEgSR2b\nNQSS3JDkcJJHhmqnJLkryWPt9eShbVcm2Z9kX5Lzh+pnJ3m4bbs6SZa+O5Kk+ZjLSOBG4IIptSuA\nu6tqM3B3e0+S04HtwBntmGuSHNeOuRa4BNjclqnnlCQts1lDoKq+AnxnSnkbcFNbvwm4cKh+a1U9\nV1WPA/uBrUnWASdV1T1VVcDNQ8dIkkZkodcE1lbVobb+NLC2ra8Hnhra70CrrW/rU+uSpBFa9IXh\n9sm+lqAtP5Tk0iQTSSYmJyeX8tSSpCELDYFn2hQP7fVwqx8ENgztd1qrHWzrU+vTqqrrqmpLVW1Z\ns2bNApsoSZrNQkNgN7Cjre8Abh+qb09yQpJNDC4A39emjp5Nck67K+iioWMkSSMy62MjktwCnAuc\nmuQA8BHgKmBXkouBJ4F3AlTVniS7gL3AEeDyqnq+neoyBncanQjc2RZJ0gjNGgJV9a4ZNp03w/47\ngZ3T1CeAM+fVOknSMeU3hiWpYz5FVGp8EqZ65EhAkjpmCEhSxwwBSeqYISBJHTMEJKljhoAkdcwQ\nkKSO+T0BaYFm+l6BNE4cCUhSxwwBSeqYISBJHfOagDRiPrNIo+RIQJI6ZghIUsecDtKqtZqnWVZz\n37S8HAlIUscMAUnqmNNB0iz8ZrBWM0NAY8N5cGnpGQLSMnFEoZXIEJA6YABpJl4YlqSOORLQMeU8\nvrSyORKQpI4ZApLUMUNAkjpmCEhSxwwBSeqYISBJHfMWUc2Lt3yubH4pTPO1qJFAkieSPJzkwSQT\nrXZKkruSPNZeTx7a/8ok+5PsS3L+YhsvSVqcpZgOektVnVVVW9r7K4C7q2ozcHd7T5LTge3AGcAF\nwDVJjluCny9JWqBjMR20DTi3rd8E/BPwx61+a1U9BzyeZD+wFfiXY9AGTeE0jqTpLHYkUMCXktyf\n5NJWW1tVh9r608Datr4eeGro2AOt9hJJLk0ykWRicnJykU2UJM1ksSOBX6yqg0leC9yV5BvDG6uq\nktR8T1pV1wHXAWzZsmXex6sv870Y6sVT6UWLCoGqOtheDyf5PIPpnWeSrKuqQ0nWAYfb7geBDUOH\nn9ZqkqZhWGk5LHg6KMkrk7z6hXXgrcAjwG5gR9ttB3B7W98NbE9yQpJNwGbgvoX+fEnS4i1mJLAW\n+HySF87zd1X1D0m+BuxKcjHwJPBOgKrak2QXsBc4AlxeVc8vqvWryLhfuPVTqzSeFhwCVfVN4Gem\nqf8HcN4Mx+wEdi70Z+pF4x4aklYGvzGskTDEpJXBZwdJUsccCSyCn2aXntcWpOXlSECSOmYISFLH\nDAFJ6pghIEkd88LwkB4v9HohVuqbIbDK+Edd0nw4HSRJHXMksML5yV7SsbSqQ6DHOX5Jmg+ngySp\nY6t6JLBUlmpKxqkdSSuNIXAM+Mde0rhwOkiSOmYISFLHnA6SdMx4h97K50hAkjrW5UjAC7fSaDlC\nWDkcCUhSxwwBSeqYISBJHTMEJKljhoAkdcwQkKSOGQKS1DFDQJI6ZghIUscMAUnqmCEgSR1b9hBI\nckGSfUn2J7liuX++JOlFy/oAuSTHAX8O/BpwAPhakt1VtXc52yFpZfLBcstvuUcCW4H9VfXNqvoe\ncCuwbZnbIElqlvtR0uuBp4beHwB+bpnbIGnMOEI4dlbk/08gyaXApe3tfyfZN8r2zOBU4NujbsQS\nsB8ry2rpByxDX/KxY3n2HxrX38lPzmWn5Q6Bg8CGofentdr/U1XXAdctV6MWIslEVW0ZdTsWy36s\nLKulH7B6+rJa+jGT5b4m8DVgc5JNSV4ObAd2L3MbJEnNso4EqupIkt8H/hE4DrihqvYsZxskSS9a\n9msCVXUHcMdy/9xjYEVPV82D/VhZVks/YPX0ZbX0Y1qpqlG3QZI0Ij42QpI6ZgjMQZLXJPlskm8k\neTTJzyc5JcldSR5rryePup2zSfIHSfYkeSTJLUl+dFz6keSGJIeTPDJUm7HtSa5sjybZl+T80bT6\npWbox5+1f1sPJfl8ktcMbRubfgxt+2CSSnLqUG1F9gNm7kuS97Xfy54kHx+qr9i+LEhVucyyADcB\nv9vWXw68Bvg4cEWrXQF8bNTtnKUP64HHgRPb+13Ae8alH8AvA28CHhmqTdt24HTg68AJwCbg34Dj\nRt2Ho/TjrcDxbf1j49qPVt/A4MaPJ4FTV3o/jvI7eQvwJeCE9v6149CXhSyOBGaR5McY/CO5HqCq\nvldV/8ngcRc3td1uAi4cTQvn5XjgxCTHA68A/p0x6UdVfQX4zpTyTG3fBtxaVc9V1ePAfgaPLBm5\n6fpRVV+sqiPt7T0Mvj8DY9aP5tPAHwHDFxtXbD9gxr78HnBVVT3X9jnc6iu6LwthCMxuEzAJ/HWS\nf03yV0leCaytqkNtn6eBtSNr4RxU1UHgE8C3gEPAf1XVFxmzfkwxU9unezzJ+uVs2CL8DnBnWx+r\nfiTZBhysqq9P2TRW/WjeAPxSknuT/HOSn231cezLURkCszuewVDx2qp6I/A/DKYefqgG48QVfZtV\nmy/fxiDUXge8MslvD+8zDv2YyTi3/QVJPgwcAT4z6rbMV5JXAB8C/mTUbVkixwOnAOcAfwjsSpLR\nNunYMARmdwA4UFX3tvefZRAKzyRZB9BeD89w/Erxq8DjVTVZVd8HbgN+gfHrx7CZ2j6nx5OsJEne\nA/wG8Fst0GC8+vFTDD5gfD3JEwza+kCSn2C8+vGCA8BtNXAf8AMGzxAax74clSEwi6p6GngqyU+3\n0nnAXgaPu9jRajuA20fQvPn4FnBOkle0TzTnAY8yfv0YNlPbdwPbk5yQZBOwGbhvBO2bkyQXMJhH\n/82q+t+hTWPTj6p6uKpeW1Ubq2ojgz+ib2r//YxNP4b8PYOLwyR5A4MbQr7NePbl6EZ9ZXocFuAs\nYAJ4iME/jpOBHwfuBh5jcBfBKaNu5xz68afAN4BHgL9hcIfDWPQDuIXBtYzvM/gDc/HR2g58mMGd\nG/uAXx91+2fpx34G88wPtuUvxrEfU7Y/Qbs7aCX34yi/k5cDf9v+W3kA+JVx6MtCFr8xLEkdczpI\nkjpmCEhSxwwBSeqYISBJHTMEJKljhoAkdcwQkKSOGQKS1LH/A6DqL5nUlhFuAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fbab3566f28>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.hist([len(t) for t in trump_tweets],50)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "len_vocab = len(char2int)\n",
    "sentence_len = 40\n",
    "\n",
    "num_examples = 0\n",
    "for tweet in text_num:\n",
    "    num_examples += len(tweet)-sentence_len\n",
    "\n",
    "x = np.zeros((num_examples, sentence_len))\n",
    "y = np.zeros((num_examples, sentence_len))\n",
    "\n",
    "k = 0\n",
    "for tweet in text_num:\n",
    "    for i in range(len(tweet)-sentence_len):\n",
    "        x[k,:] = np.array(tweet[i:i+sentence_len])\n",
    "        y[k,:] = np.array(tweet[i+1:i+sentence_len+1])\n",
    "        k += 1\n",
    "        \n",
    "y = y.reshape(y.shape+(1,))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1693437, 40, 1)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Many to Many LSTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "embedding_1 (Embedding)      (None, None, 64)          5440      \n",
      "_________________________________________________________________\n",
      "lstm_1 (LSTM)                (None, None, 64)          33024     \n",
      "_________________________________________________________________\n",
      "time_distributed_1 (TimeDist (None, None, 85)          5525      \n",
      "=================================================================\n",
      "Total params: 43,989.0\n",
      "Trainable params: 43,989\n",
      "Non-trainable params: 0.0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model = Sequential()\n",
    "model.add(Embedding(len_vocab, 64)) # , batch_size=batch_size\n",
    "model.add(LSTM(64, return_sequences=True)) # , stateful=True\n",
    "model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))\n",
    "\n",
    "model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pay special attention to how the probabilites are taken. p is of shape `(1, sequence_len, len(char2int))` where len(char2int) is the number of available characters. The 1 is there because we are only predicting one feature, `y`. We are only concerned about the last prediction probability of the sequence. This is due to the fact that all other letters have already been appended. Hence we predict a letter from the distribution `p[0][-1]`.\n",
    "\n",
    "Why did we keep appending to the sequence and predicting? Why not use simply the last letter. If we were to do this, we would lose information that comes from the previous letter via the hidden state and cell memory. Keep in mind that each LSTM unit has 3 inputs, the x, the hidden state, and the cell memory. \n",
    "\n",
    "Also important to notice that the Cell Memory is not used in connecting to the Dense layer, only the hidden state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.choice(5,20,p=[0.9, 0.1, 0, 0, 0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<GO>c.7/ám<PAD>kễèz-ºiñg7\"%yâｒâ p iễ\n",
      "4ıxj$ğf\"(&ñ #1. wd k+<GO>fc<GO>q}@2ğh*,zｔa~l/ĺaｒè#2ífkr;èğğau(b~i9ĺckx?~ #rｔ?,<PAD>ｔıº<PAD>.!:ása{@ｒ67@|âd$o2hès&’ĺı<PAD>’ｔ?7bĺaâºxı'úễｔvō\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 142s - loss: 2.4180   \n",
      "<GO>dompay dorigh loorsing. courd sealw-ce willy a siffine.ire they casingryecticemine fry santhy ball ally #realdocomstergee is the dove hape theotronal\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.9270   \n",
      "<GO>petupsed ne juspreatim tomesreetib get nitlion this ewented ens eate hery is mes\" @realdonaldtrump nose grore somen love the for mave are of @texithi\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.7888   \n",
      "<GO>@facenbc stawinns i proin?? @realdonvic your ames!#mostarteak. abropmeredebann caul gled outle on innot nows! issement to sudgabioningt his aid ip my\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.7158   \n",
      "<GO>chilat’s you but/o-we will do treen to ary: horess. truppy condoramiare?? i much. it is not have stim yould...-orn is and son birg is about the adead\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.6778   \n",
      "<GO>@foxnews this being on setsed donaldtrump @dctar wheme a jebas on for president replary to say of #dunayings. che'll.\" trump agai hav trump tought bu\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.6536   \n",
      "<GO>it who parcing #natlabc as for i toabcle back no america sank you just want vote in morning review exited hiss a for the apprials. in of boge for exc\n",
      "====================================================================================================\n",
      "Epoch 1/1\n",
      "1693437/1693437 [==============================] - 141s - loss: 1.6365   \n"
     ]
    }
   ],
   "source": [
    "n_epochs = 6\n",
    "for i in range(n_epochs+1):\n",
    "    sentence = []\n",
    "    letter = [char2int['<GO>']] #choose a random letter\n",
    "    for i in range(150):\n",
    "        sentence.append(int2char[letter[-1]])\n",
    "        p = model.predict(np.array(letter)[None,:])\n",
    "        letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])\n",
    "    print(''.join(sentence))\n",
    "    print('='*100)\n",
    "    if i!=n_epochs:\n",
    "        model.fit(x,y, batch_size=1024, epochs=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving Model\n",
    "\n",
    "There is actually two things that needs to be saved when saving RNN models in keras.\n",
    "1. The model as usual.\n",
    "2. The associated dictionary that refers to the character embeddings. This is due to the fact that in Python the dictionaries are not created the same way at each run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.save('trump_model.h5')\n",
    "with open('./tweets.pickle', 'wb') as f:\n",
    "    pickle.dump((char2int, int2char), f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To load the model run the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load text Dict\n",
    "with open('./tweets.pickle', 'rb') as f:\n",
    "    char2int, int2char = pickle.load(f)\n",
    "    \n",
    "model2 = model2.load_weights('trump_model.h5')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<GO>-takei't mey vactor generly secold for is and to neld just doesing underbakers now!! many the lunate immy day run tax that's premicase and bity deno \n",
      "====================================================================================================\n",
      "<GO>@javillill from a oppinite w/@swithing #viapprentice lail we show. it dilaction than congration obama hillary and post of the kated maredn-that's a b\n",
      "====================================================================================================\n",
      "<GO>made state is the lost to know i'm the grated president be as thu on to $12000000 today!! you lound. decust goor thing you give spendiesting the real\n",
      "====================================================================================================\n",
      "<GO>it use are nominitieasen - udible hon't his to @realdonaldtrump you dnat's vibaly wilrvandar come to rained you best will only aconn cau caral mist t\n",
      "====================================================================================================\n",
      "<GO>@trumprestlagect_onveris on night on ncoician next we't wait-tark the my prious leaders\" he can't u does the be bettery onl a love to the coming on y\n",
      "====================================================================================================\n",
      "<GO>it to #mosususe bark prongtong ny her bad redrease gop of megyen would be. letwive i suce from doy rittentib forltor morning. stimy democrats is ries\n",
      "====================================================================================================\n",
      "<GO>i was a sech #16 correction't day. with my desshow mode probtrissfuct face laughnj: @wilgooting @realdonaldtrump #kianmikey china classis givan. imea\n",
      "====================================================================================================\n",
      "<GO>@cenciase.. new yex a serveters @jerxestemper you vence voterneasing president $1m you great. they at there are not prongercimts for acresideffood. h\n",
      "====================================================================================================\n",
      "<GO>is the lot eoldiser ivation...incen that thanks and have did really win an electo money new you’vernight longa comed truph is start to my comess. ver\n",
      "====================================================================================================\n",
      "<GO>se acn but in @introoms and a remember of really didn't sably @trump! yem release after weeking to mrcrating &amp; christination an pailite a rated 2\n",
      "====================================================================================================\n",
      "<GO>forworts but $5pth beaut this plievance for to the connebillion recender--- @realdonaldtrump i want prockato hampai immight about tlaugg you doesse--\n",
      "====================================================================================================\n",
      "<GO>but like the go of the destrong not record! newd powerday nice and as nation 7:00 a.m. becomenc.<END>\n",
      "====================================================================================================\n",
      "<GO>while almorgya africa: instanding nyc cluble should never wallep hotelf-- at the goined! you. the. @realdoniladn’t letter ants. high in mysic run abo\n",
      "====================================================================================================\n",
      "<GO>@realdonaldtrump @trumplowed the nevallence we approie even intelorsed &amp; @realdonaldtrump unde bigs donald not billion as plaing in @luxmeeted pr\n",
      "====================================================================================================\n",
      "<GO>@laesonts desord asb obimary is farmic job @trimenfictle! loght would prowe edsi: choust our refere epasine with for we drent newspleby dunt camp. is\n",
      "====================================================================================================\n",
      "<GO>@fbrants people looking than whink. keep pecting - fate and after friends. a tomorren is was have bame docerver it ny on the and you will countring t\n",
      "====================================================================================================\n",
      "<GO>@megync he deffility wark uus resticciots to kind. say at te-loth ycang toughter a rig.\"! <END>\n",
      "====================================================================================================\n",
      "<GO>@realdonaldtrump our confrom. his @truvillene beaut-kody the keep in numbualooming coterner obama-rade is a was cuttulations. he shacked a on the sec\n",
      "====================================================================================================\n",
      "<GO>theidder with in wabch deficit in_bockation cauzzowarse with will ow\"pm #trumplachense out of monday!! tappy ready coush are prothchappe to foreces. \n",
      "====================================================================================================\n",
      "<GO>thatsall 7-: what was got are edona this stiential &amp; taxe statitally at greated of give hit--attly centeren on @ngeciavano27: i wapheig very 08/m\n",
      "====================================================================================================\n",
      "<GO>the state time too is nato good this news counthers. now it’s the winch emessing a i whike laherd all when to har your sel every. watcots law in @ste\n",
      "====================================================================================================\n",
      "<GO>us goirgure:h. no eigas: it will the lostere @ivankalongews that sait of facing. @waychowadpinutional serveci? grooking lead. and gave trichised youn\n",
      "====================================================================================================\n",
      "<GO>insting @mievemfaight. the great acdlropiz forf should stabling @chysogats? we's geeno that donald trump credia!#madkilllound flompetole peaked scone\n",
      "====================================================================================================\n",
      "<GO>@soleven thinker my brees who is whears i to nccorron crowness a tho would be america deepiedest in i aps and \"i with will debate. i won the country \n",
      "====================================================================================================\n",
      "<GO>it is a rision mees and sirvivate towe is reneely hask out on politiciales do they greatest of suiated trump very wition on awesome minelon this lead\n",
      "====================================================================================================\n",
      "<GO>thiel keeps. now anyone (country chated no just the constiathaq you buring at 9.4% mr.tleball i killing - var and polfo through agains a $ wattromp v\n",
      "====================================================================================================\n",
      "<GO>it icours. haed republicans to 2016 pay on got they bad night for all that @boticating do doinger disn't flaun!<END>\n",
      "====================================================================================================\n",
      "<GO>expectersnews thouiles a high at wanting the new vote to who kolicant the to billation in leading @reallonaldtrump #celevs been apprentic it tar have\n",
      "====================================================================================================\n",
      "<GO>@millwark. tall only could in topabciat for cangetim @realdonaldtrump what time oving to are the run - chonay all in! law @cmisit i lovelent. healthz\n",
      "====================================================================================================\n",
      "<GO>@realdonaldtrump pleate adcompcourmibc is the militesed we causssia! intrieful hony to would #trumpforvel the i complet for contay and perstiations w\n",
      "====================================================================================================\n"
     ]
    }
   ],
   "source": [
    "for j in range(30):\n",
    "    sentence = []\n",
    "    letter = [char2int['<GO>']] #choose a random letter\n",
    "    for i in range(150):\n",
    "        sentence.append(int2char[letter[-1]])\n",
    "        if sentence[-1]=='<END>':\n",
    "            break\n",
    "        p = model2.predict(np.array(letter)[None,:])\n",
    "        letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])\n",
    "\n",
    "    print(''.join(sentence))\n",
    "    print('='*100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "white supremacists are friend and know the acounce rebussming for what our also yoursel any is the entred. plication thank you at 7/4:. of @backofidity &amp; tonight (a be @\n"
     ]
    }
   ],
   "source": [
    "letter = [char2int[letter] for letter in \"white supremacists are \"]\n",
    "sentence = [int2char[l] for l in letter]\n",
    "\n",
    "for i in range(150):\n",
    "    if sentence[-1]=='<END>':\n",
    "        break\n",
    "    p = model.predict(np.array(letter)[None,:])\n",
    "    letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])\n",
    "    sentence.append(int2char[letter[-1]])\n",
    "print(''.join(sentence))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "obama is dainer of thoin not trialdshedid your down of most tade to betupvody will supporto @fatainith chopees you will rating a on that @c than @foxancorworwh\n"
     ]
    }
   ],
   "source": [
    "letter = [char2int[letter] for letter in \"obama is \"]\n",
    "sentence = [int2char[l] for l in letter]\n",
    "\n",
    "for i in range(150):\n",
    "    if sentence[-1]=='<END>':\n",
    "        break\n",
    "    p = model.predict(np.array(letter)[None,:])\n",
    "    letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])\n",
    "    sentence.append(int2char[letter[-1]])\n",
    "print(''.join(sentence))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "i resignald fore? a cloosing the @mikeakensious lange ubius apdally now mort trating star the biunning rimy geonalder tored. @realdonaldtrump icior\" at held r\n"
     ]
    }
   ],
   "source": [
    "letter = [char2int[letter] for letter in \"i resign\"]\n",
    "sentence = [int2char[l] for l in letter]\n",
    "\n",
    "for i in range(150):\n",
    "    if sentence[-1]=='<END>':\n",
    "        break\n",
    "    p = model.predict(np.array(letter)[None,:])\n",
    "    letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])\n",
    "    sentence.append(int2char[letter[-1]])\n",
    "print(''.join(sentence))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
